Creating a live, public short message service corpus: the NUS SMS corpus

نویسندگان

  • Tao Chen
  • Min-Yen Kan
چکیده

Short Message Service (SMS) messages are short messages sent from one person to another from their mobile phones. They represent a means of personal communication that is an important communicative artifact in our current digital era. As most existing studies have used private access to SMS corpora, comparative studies using the same raw SMS data have not been possible up to now. We describe our efforts to collect a public SMS corpus to address this problem. We use a battery of methodologies to collect the corpus, paying particular attention to privacy issues to address contributors’ concerns. Our live project collects new SMS message submissions, checks their quality, and adds valid messages. We release the resultant corpus as XML and as SQL dumps, along with monthly corpus statistics. We opportunistically collect as much metadata about the messages and their senders as possible, so as to enable different types of analyses. To date, we have collected more than 71,000 messages, focusing on English and Mandarin Chinese.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing the Effect of Lecturing and Mobile Phone Short Message Service (SMS), Based on the Theory of Planned Behavior on Improving Nutritional Behaviors of High School Students in the Prevention of Osteoporosis

Background: Osteoporosis is a common increasing disease on which the lifestyle has an important role. This study aimed to examine the two educational methods (lecture and texting) using the theory of planned behavior on improving the osteoporosis preventive nutritional behaviors in high school students. Materials and Methods: This semi-experimental study conducted on 138 female students in who ...

متن کامل

Weak Semi-Markov CRFs for NP Chunking in Informal Text

This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. W...

متن کامل

Psychometric properties of the short message service problem use diagnostic questionnaire among Iranian students

Abstract The Excessive use of short massage services can lead to traumatic psychological, interpersonal and social consequences. The aim of the present study was to evaluate the psychometric properties of the SMS Problem Use Diagnostic Questionnaire (SMS-PUDQ). The sample consisted of 200 students of Tehran University of medical sciences which were selected through convenience sampling. The...

متن کامل

A translated corpus of 30, 000 French SMS

The development of communication technologies has contributed to the appearance of new forms in the written language that scientists have to study according to their peculiarities (typing or viewing constraints, synchronicity, etc). In the particular case of SMS (Short Message Service), studies are complicated by a lack of data, mainly due to technical constraints and privacy considerations. In...

متن کامل

The Effect of Short Message Service on Knowledge of Patients with Diabetes in Yazd, Iran

OBJECTIVE: Diabetes mellitus has shown a tremendous health and social burden worldwide. Better glycemic control in patients with diabetes can be achieved by improving their knowledge which consequently will prevent developing microvascular and neurological complications. Some studies demonstrate effectiveness of Short Message Service (SMS) for patient education. Regarding exponential growth in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Language Resources and Evaluation

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2013